Skip to content

[CHAT-645] Ensure Faithfulness evaluation sends all retrieved sources in the correct format#1124

Open
davidgisbey wants to merge 2 commits into
mainfrom
645-faithfulness-fixes
Open

[CHAT-645] Ensure Faithfulness evaluation sends all retrieved sources in the correct format#1124
davidgisbey wants to merge 2 commits into
mainfrom
645-faithfulness-fixes

Conversation

@davidgisbey
Copy link
Copy Markdown
Contributor

@davidgisbey davidgisbey commented May 21, 2026

Description

There are a couple of issues Nick identified in our Faithfulness evaluation:

  1. We're not sending all the chunks that we sent to the LLM to generate an answer. Currently, we're only sending the chunks from used sources.
  2. We're not sending the chunks to the LLM in the correct format. We're using the #plain_content method. This strips out the html from the content, including the inline links.

I've updated the evaluation to send all the sources that we send to the structured answer comp llm call and updated the chunk content to mirror the implementation in the evaluation repo.

Jira ticket

https://gdsgovukagents.atlassian.net/jira/software/c/projects/CHAT/boards/269?label=Backend_Dev&selectedIssue=CHAT-645

Previously, we've only been sending the used sources to the faithfulness
evaluation, which it out of line with the evaluation repo.

Part of what we want to evaluate is whether the model is correctly
identifying which sources are relevant, so we should be sending all
retrieved sources.
@govuk-ci govuk-ci temporarily deployed to govuk-chat-645-faithful-as3vob May 21, 2026 13:34 Inactive
@davidgisbey davidgisbey force-pushed the 645-faithfulness-fixes branch from d25c224 to 6c43f17 Compare May 21, 2026 13:35
@govuk-ci govuk-ci temporarily deployed to govuk-chat-645-faithful-as3vob May 21, 2026 13:35 Inactive
@davidgisbey davidgisbey force-pushed the 645-faithfulness-fixes branch from 6c43f17 to 37b2971 Compare May 21, 2026 13:39
@govuk-ci govuk-ci temporarily deployed to govuk-chat-645-faithful-as3vob May 21, 2026 13:40 Inactive
@davidgisbey davidgisbey force-pushed the 645-faithfulness-fixes branch from 37b2971 to 5a66d7d Compare May 21, 2026 13:54
@govuk-ci govuk-ci temporarily deployed to govuk-chat-645-faithful-as3vob May 21, 2026 13:54 Inactive
Comment thread lib/auto_evaluation/faithfulness.rb
Comment thread lib/auto_evaluation/faithfulness.rb
@davidgisbey davidgisbey force-pushed the 645-faithfulness-fixes branch from 5a66d7d to 850a6e6 Compare May 21, 2026 15:11
@govuk-ci govuk-ci temporarily deployed to govuk-chat-645-faithful-as3vob May 21, 2026 15:11 Inactive
We were sending the plain content of the chunk to the LLM which means
that the LLM isn't actually getting the full context in the sources.

The reason for this is that when we use the plain_content method on the chunk,
it strips out inline links. This updates the code to use the same
format as the evaluation repo which sends the html content of the chunk
to the LLM.
Copy link
Copy Markdown
Contributor

@langenk langenk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, looks good to me now. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants